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SEMICOARSENING  AND  IMPLICIT  SMOOTHERS  FOR  THE  SIMULATION  OF  A 

FLAT  PLATE  AT  YAW* * * § 


RUBEN  S.  MONTEROt,  IGNACIO  M.  LLORENTE+ ,  AND  MANUEL  D.  SALAS§ 

Abstract.  This  paper  presents  a  full  multigrid  solver  for  the  simulation  of  flow  over  a  yawed  flat  plate. 
The  two  problems  associated  with  this  simulation;  boundary  layers  and  entering  flows  with  non-aligned 
characteristics,  have  been  successfully  overcome  through  the  combination  of  a  plane-implicit  solver  and 
semicoarsening.  In  fact,  this  multigrid  algorithm  exhibits  a  textbook  multigrid  convergence  rate,  i.e. ,  the 
solution  of  the  discrete  system  of  equations  is  obtained  in  a  fixed  amount  of  computational  work,  indepen¬ 
dently  of  the  grid  size,  grid  stretching  factor  and  non-alignment  parameter.  Also,  a  parallel  variant  of  the 
smoother  based  on  a  four-color  ordering  of  planes  is  investigated. 

Key  words,  plane  implicit  smoothers,  semicoarsening,  robust  multigrid,  flat  plate 

Subject  classification.  Applied  and  Numerical  Mathematics 

1.  Introduction.  The  flow  of  a  viscous  fluid  over  a  solid  obstacle  can  be  divided  into  two  regions  of 
interest.  A  very  thin  layer  close  to  the  surface  of  the  body  in  which  the  velocity  gradient  normal  to  the 
surface  is  very  large,  and  the  remaining  region  where  no  such  gradients  occur  and  the  influence  of  viscosity 
can  be  neglected.  It  can  be  shown  from  several  exact  solutions  of  the  Navier-Stokes  equations  that  the 
thickness  of  the  boundary  layer  is  proportional  to  the  square  root  of  the  kinematic  viscosity.  Hence,  in  the 
simulation  of  high  Reynolds  number  flows,  a  high  density  of  nodes  must  be  concentrated  near  the  body 
surface  to  capture  the  viscous  effects  numerically. 

It  is  well  known  that  standard  multigrid  algorithms  suffer  from  a  slow-down  in  convergence  in  such  an 
anisotropic  situation  (see  for  example  [1,  15]).  There  are  two  main  approaches  to  deal  with  these  anisotropic 
operators.  The  first  approach  consists  in  improving  the  smoothing  process  by  using  an  alternating  direction 
block-implicit  smoother  [13].  This  algorithm  explores  all  the  possible  directions  of  coupling  of  the  variables. 
On  the  other  hand,  the  second  approach  relies  on  improving  the  coarse-grid  operator.  Algorithms  like 
selective  coarsening  [7],  flexible  multiple  semicoarsening  [26]  or  block  implicit  relaxation  combined  with 
semicoarsening  [6],  among  others,  fall  into  this  category.  Although  these  methods  have  been  successfully 
applied  to  fully  elliptic  equations  [17]  and  the  2-D  Navier-Stokes  equations  [18,  23]  their  application  to  the 
Navier-Stokes  equations  in  3-D  has  been  limited. 

The  simulation  analyzed  in  this  work  represents  an  entering  flow  type  with  the  characteristics  entering 
through  one  boundary.  If  the  flow  does  not  recirculate,  downstream  marching  results  in  a  very  efficient  solver 
for  the  convective  operator.  However,  if  the  main  stream  velocities  are  not  aligned  with  the  grid  lines  the 
efficiency  of  the  multigrid  method  degenerates  dramatically.  In  this  case,  error  components  that  are  much 
smoother  in  the  characteristic  direction  than  in  others,  are  not  well  approximated  in  coarser  grids.  The 
main  reason  is  the  increasing  numerical  viscosity  induced  on  coarser  levels  [3,  9].  One  way  to  prevent  this 
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degradation  in  convergence  is  to  use  block  implicit  relaxation  combined  with  semicoarsening  [8].  This  first 
approach  has  been  applied  to  the  3-D  constant  coefficient  convection  equation  in  [14].  When  solving  the 
Navier-Stokes  equations  it  is  also  possible  to  use  distributive  Gauss-Seidel  relaxation  (DGS)  [3,  22].  The 
DGS  decouples  the  elliptic  component  of  the  system  (solved  effectively  by  multigrid)  from  the  non  elliptic 
one  which  can  be  solved  through  marching. 

The  two  problems  involved  in  the  simulation  studied  here,  namely  boundary  layers  and  non-aligned 
grids  with  open  characteristics,  have  been  pointed  out  as  one  of  the  factors  that  prevent  the  achievement 
of  optimal  multigrid  efficiencies  in  CFD  codes  [2].  Optimal  convergence  is  defined  as  the  resolution  of  the 
governing  system  of  equations  in  a  few  (less  than  ten)  work  units.  A  work  unit  is  usually  defined  as  the 
time  required  to  compute  the  residual  of  the  system  in  the  finest  grid.  This  property  is  defined  by  Brandt 
as  Textbook  Multigrid  Efficiency  (TME)  [1].  Another  desirable  quality  of  a  multigrid  solver  is  its  robustness. 
The  robustness  of  a  smoother  is  defined  as  its  ability  to  efficiently  solve  a  wide  range  of  problems.  In  this 
sense  the  definition  of  robustness  is  qualitative  and  has  to  be  defined  more  precisely  by  setting  up  a  set  of 
suitable  test  problems.  In  the  present  context  we  will  characterize  the  multigrid  algorithms  as  robust  if  the 
solution  of  the  governing  system  of  equations  can  be  attained  in  a  fixed  amount  of  work  units  independent 
of  the  grid  size,  grid  stretching  factor  and  the  non-alignment  parameter.  We  will  refer  to  this  property  as 
Textbook  Multigrid  Convergence  (TMC). 

The  purpose  of  this  work  is  to  present  a  multigrid  algorithm  which  achieves  textbook  convergences  for 
the  simulation  of  the  flow  over  a  yawed  flat  plate.  In  order  to  solve  the  two  problems  outlined  above  we 
present  in  section  3  a  FMG-FAS  multigrid  algorithm  based  on  a  plane  implicit  smoother  combined  with 
semicoarsening.  The  numerical  results  analyzed  in  section  4  show  that  the  algorithm  used  in  this  work  is 
fully  robust  for  the  model  problem  considered.  In  that  section,  we  will  also  investigate  a  four  color  ordering 
of  planes  which  enables  the  parallel  implementation  of  the  smoother.  The  paper  ends  with  some  conclusions. 

2.  Finite  Volume  Discretization.  Let  us  consider  the  dimensionless  steady  state  incompressible 
Navier-Stokes  equations: 

(u  •  V)u  =  -  Vp  +  -5-Au, 

Re 

(2.1)  V  •  u  =  0, 

where  u  E  Me 3  =  (u,v,w)  is  the  non-dimensional  velocity  field  and  p  is  the  dimensionless  pressure.  Re  is 
the  Reynolds  number  defined  as  Re  =  UoovL ,  where  LGo  is  a  characteristic  velocity,  L  a  characteristic  length 
and  v  the  kinematic  viscosity. 

The  system  of  non-linear  equations  2.1  is  discretized  over  an  orthogonally  structured  grid  with  a  staggered 
arrangement  of  unknowns,  where  the  velocity  field  is  defined  on  the  control  volume  faces,  and  the  pressure 
field  at  the  center  (see  figure  2.1).  The  most  important  issues  of  the  finite  volume  technique  will  be  briefly 
repeated  for  the  u  momentum  equation.  Integration  of  the  convective  terms  of  2.1  over  a  control  volume 
CVijk  gives: 

(2.2)  /  u  •  Vu  dV  =  \  /  u(u  *  n)  dS  =  ra/.'U/.,  k  =  e,  w,  s,  n,  t,  b; 

Jcvijk  Y  ^dcyijk  k 

where  n  is  the  outward  normal  vector  to  the  CVijk  faces,  and  the  indexes  e,  w, ...  stand  for  the  usual  cardinal 
notation  (see  [10]).  As  an  example,  let  us  consider  the  east  face.  For  the  approximation  of  the  u  velocity 
an  upwind  biased  definition  is  used,  the  low  order  or  driver  operator  used  to  iterate  the  solution  is  a  pure 
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Top  Face  (t) 


Fig.  2.1.  Placement  of  the  unknowns  in  the  control  volume  (left-hand  chart).  Approximation  of  the  velocity  at  the  surface 
of  the  control  with  the  QUICK  scheme  (right-hand  chart). 


upwind: 


(2.3) 


ue 


Uijk  +5+  (u  •  n)e  >  0; 

Ui+ijk  +  S~  (u  •  n)e  <  0. 


where  the  S±  terms  corresponds  to  the  contribution  of  higher  order  operator  corrections.  The  S±  source 
terms  are  calculated,  within  the  multigrid  cycle  as  explained  in  section  3,  using  a  second  order  QUICK  [12] 
scheme.  With  QUICK,  the  velocity  at  the  surface  of  the  control  volume  is  interpolated  by  fitting  a  parabola 
to  the  values  of  the  velocity  at  three  consecutive  nodes:  the  two  nodes  located  on  either  side  of  the  surface 
of  interest,  plus  the  adjacent  node  in  the  upstream  direction  (see  figure  2.1).  The  mass  fluxes  can  be 
easily  evaluated  using  linear  weighted  interpolation.  The  rest  of  the  details  about  the  discretization  of  2.1 
have  been  discussed  in  detail  in  [16]. 


3.  The  Multigrid  Algorithm.  The  base  solver  that  we  have  employed  in  this  work  is  a  full  multigrid 
FMG  algorithm  [1].  Let  us  define  a  set  of  grids  G  =  {£lk  :  k  =  0, 1,2,  where  Qo  is  the  finest  target 

grid  and  the  rest  of  the  grids  are  obtained  by  applying  some  coarsening  procedure.  In  the  FMG  algorithm 
the  calculations  start  on  the  coarsest  grid  fi/y  Once  the  problem  is  solved,  the  solution  is  interpolated  to 
the  next  finer  level  fi/v-i  to  provide  a  good  initial  approximation  to  the  discrete  problem  on  that  level.  This 
procedure  is  repeated  until  the  finest  grid  Qo  is  reached.  The  main  goal  of  a  FMG  algorithm  is  to  provide 
an  approximation  u°  of  the  discrete  solution  u°  up  to  an  algebraic  error  \\u°  —  u°\\  which  is  smaller  than  the 
discretization  error  \\u  —  u°||. 

Because  of  the  non-linearity  of  the  Navier-Stokes  equations,  each  level  in  the  FMG  process  is  solved 
with  some  full  approximation  scheme  (FAS)  [1]  multigrid  cycles.  The  FAS  cycle  for  a  given  grid  Vtn  can  be 
recursively  defined  as  follows;  let  us  consider  the  non-linear  discrete  problem  on  Qn: 


(3.1) 


Lnun  =  r, 


After  applying  v\  iterations  of  a  non-linear  smoother  to  the  system  3.1  a  new  approximation  un  is  obtained. 
Now,  the  approximation  un  and  the  residual  rn  =  fn  —  Lnun  are  transfered  to  the  next  coarser  grid  On+i: 

un+ 1  =  /™+1un; 

(3.2)  rn+1  =  /™+1rn, 

the  restriction  operators  /™+1  are  discussed  below.  On  the  grid  Qn+i  the  defect  equation  is  solved: 

T  ,  'T/n+1  —  T‘n+1  _u  T 
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(3.3) 


un+ 1  =  un+1  +  A  un+1, 


where  Ln+ 1  is  some  coarse-grid  approximation  to  Ln.  Note  that  in  the  FAS  algorithm  we  solve  for  the 
full  approximation  un+1  rather  than  for  the  correction  A un+1 .  The  approximate  solution  to  the  coarse-grid 
problem  3.3  is  obtained  in  7  multigrid  cycles  for  the  grid  fln+i.  When  the  coarsest  level  CIn  is  reached, 
the  solution  to  3.2  is  obtained  by  several  relaxation  sweeps  (i/q)  of  the  smoothing  process.  Once  the  system 
3.3  is  solved,  the  correction  A un+1  is  transferred  back  to  the  finer  grid  Q,n  and  added  to  the  fine-grid 
approximation: 

(3.4)  un  <-  un  +  I™+1(un+1  -  un+1 ), 

so  sweeps  of  the  non-linear  smoothing  process  are  applied  to  the  system  3.1,  using  the  new  solution  un 
as  the  initial  guess.  In  algorithm  1  a  recursive  implementation  of  the  FAS  cycle  is  shown. 

Algorithm  1  FAS(z7,z/2,7,n)  multigrid  cycle  for  a  given  grid  where  17  and  v 2  denote  the  number  of  pre 
and  post-smoothing  iterations.  The  cycle  type  is  fixed  with  7. 

if  n=N  then 

uN  =  Smooth(Z/7v,  uN ,  fN,  Vo) 
else 

un  =  Smooth(Ln,  un,  /n,  v\) 

Tn  ^ —  fn  —  LnUn 

rn+ 1  jn+lrn 

Un+ 1  <r- 

/n+ 1  <r-  rn+1  +  Ln+1un+1 
for  i  =  0  to  7  do 

FAS(z/i,z/2,7>™  +  1) 

end  for 

un  ^  un  +  /^+1(^n+1  -  un+1) 
un  =  Smooth (Ln,  fin, /n,  z/2) 

end  if 


The  multigrid  cycle  is  characterized  by  the  number  of  pre  and  post-smoothing  iterations  (27,^2),  and 
7  which  sets  the  order  in  which  the  grids  are  visited.  Depending  on  7,  the  cycle  is  denoted  by  V^i,^)  if 
7  =  1  and  by  W(z/i,z/2)  if  7  =  2.  We  will  also  consider  an  F-cycle,  which  corresponds  to  an  index  between 
the  V  and  W-cycles,  i.e.  1  <  7  <  2.  In  figure  3.1  the  flowchart  of  the  cycles  used  in  this  work  are  shown.  In 
general,  a  growing  7  implies  an  increasing  complexity  [27,  24]  and  more  smoothing  sweeps  on  coarser  levels 
which  harms  the  parallel  properties  of  the  cycle.  However,  low  7  cycles  (i.e.,  V-cycles)  are  known  to  be  less 
robust  than  W-cycles,  specially  in  convection  dominated  problems  [18,  3].  This  is  one  of  the  reasons  why  in 
practice,  F-cycles  are  often  used  as  a  trade-off  between  V  and  W-cycles.  Note  also  that  using  semicoarsening 
as  defined  subsequently,  the  storage  requirement  of  the  multigrid  algorithm  is  twice  that  of  the  single  grid 
algorithm. 

3.1.  Restriction  and  Prolongation.  Solution  and  residuals  transfers  are  dictated  by  the  staggered 
arrangement  of  unknowns  and  the  coarsening  procedure  used.  In  the  following  we  will  consider  a  fine  grid 
Qh  defined  by  the  nodes: 

fth  =  {x  G  IR3  :  x  =  kh,  k  =  (i,j,k),  h  =  (hx,hv,hz),  i  =  0,...,nx, 
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nh 

^2h 

^8h 

Fig.  3.1.  Scheme  of  a  V-cycle  V(i/i,i/2)  (left-hand  chart)  and  an  F-cycle  F(v  1,1/2)  where  represents  the  number  of 
iterations  of  the  smoother  performed  to  solve  the  coarsest  level 

j  =  0  ,  .  .  .  ,  fly  ,  k  =  0  ,  .  .  .  ,  Tlz  ,  hx  =  1  /  fix  5  hy  =  1  /  fly  ,  hz  =  1  /  flZ  J  }  5 

and  a  coarse  grid  fi<h+i  obtained  by  semicoarsening  from  With  semicoarsening  there  is  only  one  direction 
in  which  the  mesh  space  is  doubled,  let’s  say  for  example  Hx  =  2 hx,Hy  =  hy,Hz  =  hz  or  equivalently 
Nx  =  nx/2,  Ny  =  fly,  Nz  =  nz.  Hence  all  Fourier  modes  in  the  y  and  2  directions  can  be  exactly  represented 
on  Q,h+i  and  the  smoother  has  only  to  damp  components  of  the  error  oscillating  highly  in  the  x  direction. 

The  restriction  operator  /£+1  in  equation  3.2  is  used  to  restrict  values  from  the  fine  grid  Vth  to  the  coarse 
grid  Clh+ 1-  The  component  of  the  velocity  parallel  to  the  direction  in  which  the  coarsening  is  performed  is 
transfered  using  injection,  while  the  other  two  components  together  with  the  scalar  field  are  restricted  using 
linear  weighted  interpolation: 

(Jh^  ^2 ijk  —  H ijki 

(I%+1v)2ijk  =  (Sx~V2ijk  +  8x+V2i+ljk)/ 

(I^+1w)2ijk  =  (Sx~W2ijk  +  5x+W2i+ljk)/Ax, 

{Ih+1p)2ijk  =  ( Sx~P2ijk  +8x+P2i+ljk)/^X, 

Ax  —  %2i-\-2jk  %2  ijk  8x~^~  —  %2i-\-2jk  X2i-\-ljk  8x  —  X2i-\-ljk  %2  ijki 

Vi  G  [0,  Nx  —  1]  Vj  G  [0,  Ny  —  1]  Vk  G  [0,  Nz  —  1\. 

Note  that  because  of  the  staggered  arrangement  of  unknowns,  the  velocity  component  parallel  to  the  coars¬ 
ened  direction  is  treated  in  a  vertex-centered  way,  while  the  rest  of  the  variables  are  transferred  as  cell- 
centered. 

The  operator  /^+1  in  equation  3.4  is  used  to  transfer  data  from  a  coarser  grid  Qh+i  to  the  finer  Qh.  In 
this  case  the  prolongation  is  a  weighted  linear  interpolation  for  the  vertex  centered  variable: 

^2 ijk  —  11 ijki 

(Jh  n)2i+ljfe  —  {8x  Uijk  T  8x  Ui-^-ijk  )Ax, 

Ax  =  X2i-\-ljk  %2i—ljk  Sx^  =  X2i-\-ljk  ^2  ijk  8x  =  X2  ijk  %2i—ljki 

Vi  G  [0,  Nx  —  1]  Vj  G  [0,  Ny  —  1]  Vke[Q,Nz-l]. 

The  cell  centered  variables  are  treated  using  weighted  linear  interpolation,  for  example,  for  the  v  component 
of  the  velocity  field  we  have: 

(I^+1v)2ijk  =  (0 .56x+vijk  +  ( 6x~  +0.55x+)vi-1jk)/Ax 
(lk+1v)2i-ijk  =  ((Sx+  +  0.55x~)vijk  +  0.5Sx~Vi-ljk)/ Ax 
Ax  =  X2i-\-2jk  ^2  ijk  Sx~^~  =  X2i-\-2jk  X2i-\-\jk  Sx  =  X2i-\-\jk  X2  ijki 

Vi  G  [0,  Nx  -  1]  Vi  G  [0,  Ny  -  1]  Vfc  G  [0,  Nz  -  1]. 
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When  values  outside  the  computational  domain  are  needed  in  the  above  set  of  formulas  they  are  extrapolated 
from  the  inner  nodes  with  the  help  of  the  boundary  condition. 

3.2.  Smoothing  Operator.  The  present  approach  uses  a  coupled  smoother  where  the  momentum 
and  continuity  equation  are  satisfied  simultaneously.  In  particular,  we  have  chosen  a  cell-implicit  Symmetric 
Coupled  Gauss  Seidel  (SCGS)  method  as  the  base  of  the  smoothing  process.  This  smoother  was  introduced 
by  Vanka  [25]  and  subsequently  considered  in  [23,  16].  In  the  SCGS  the  CV  are  scanned  in  some  prescribed 
order,  then  for  each  CV  the  continuity  and  momentum  equations  are  relaxed  as  follows: 

1.  The  mass  fluxes  of  the  momentum  equations  for  the  six  cell  faces  of  the  CV  are  calculated.  Also  in  this 
stage  the  corrections  made  by  the  QUICK  scheme  are  updated  based  on  the  current  approximation. 
This  is  equivalent  to  a  local  Picard  linearization.  Considering  implicitly  only  the  diagonals  of  the 
momentum  equations  and  those  terms  multiplying  the  pressure  inside  the  CV,  we  can  build  the 
following  system: 
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(3.5) 

where  the  Ru^v^w  terms  represents  the  contribution  of  the  explicit  variables. 


2.  The  system  3.5  is  easily  solved  using  Gaussian  elimination.  A  more  implicit  version  of  the  system 
(3.5)  that  includes  off-diagonal  elements  in  the  first  six  rows  is  also  possible,  which  corresponds  to 
considering  implicitly  all  the  references  to  unknowns  inside  the  CV.  However,  the  convergence  factor 
is  similar  and  the  system  is  more  expensive  to  solve  than  3.5  [23,  16]. 

3.  The  velocity  components  and  the  pressure  of  the  CV  are  updated  using  under-relaxation: 

un+1  =un +  ouu(u-un) 
pn+ 1  =pn+oop(p-pn). 

In  the  following  simulations  the  under-relaxation  factors  have  been  fixed  as  ujp  =  1.0  and  uou  =  0.8. 
However,  in  general  the  optimum  values  of  uou  are  strongly  problem  dependent  and  have  to  be  set 
empirically. 

3.3.  Plane  Implicit  Smoothers.  The  use  of  highly  anisotropic  grids  is  common  practice  in  the  field 
of  CFD.  Grid  nodes  are  usually  concentrated  in  certain  regions  of  the  computational  domain  for  accuracy 
reasons  or  to  capture  small  scale  physical  phenomena  such  as  the  boundary  layers  mentioned  before.  In  some 
situations,  when  the  direction  of  the  anisotropies  is  known  beforehand,  the  multigrid  convergence  can  be 
improved  using  an  implicit  smoother  in  the  direction  of  strong  coupling  of  the  unknowns  [1].  These  implicit 
solvers  have  been  widely  studied  in  previous  work  in  the  simulation  of  the  incompressible  Navier-Stokes 
equations,  see  for  example  [22,  19,  4,  5]. 

However,  if  the  stretched  grid  generates  aspect  ratios  whose  relative  magnitudes  vary  for  different  parts 
of  the  computational  domain,  the  multigrid  techniques  based  on  block-wise  smoothers  combined  with  full 
coarsening  fail  to  smooth  error  components  [1,  27].  In  these  situations,  the  problem  can  be  effectively  solved 
with  a  block  implicit  smoother  combined  with  semicoarsening.  In  particular,  in  the  following  simulations  we 
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will  use  an  z-semicoarsening  (i.e. ,  doubling  mesh  space  only  in  the  2  direction  as  explained  in  section  3.1) 
combined  with  a  x?/-plane  implicit  solver. 

From  an  architectural  point  of  view,  it  is  interesting  to  note  that  multigrid  methods  often  reach  a  disap¬ 
pointingly  small  percentage  of  their  theoretically  available  CPU  performance  due  to  a  limited  cache  reuse. 
Some  authors  have  successfully  improved  locality  using  different  techniques  such  as  data  access  transforma¬ 
tions  and  data  layout  transformations  [21].  Although  we  have  not  introduced  any  of  these  techniques,  we 
should  remark  that  plane  smoothers  exploit  blocking  in  an  implicit  way,  whereas  point  smoothers  have  to 
perform  global  sweeps  through  data  sets  that  are  too  large  to  fit  in  the  cache.  Despite  not  having  discussed 
cache  memory  exploitation  in  this  paper,  in  a  previous  paper  [20]  we  have  included  a  data  reuse  analysis  for 
a  similar  robust  multigrid  algorithm  applied  to  an  anisotropic  diffusion  equation. 

Block  implicit  smoothers  are  usually  based  on  a  direct  solver.  These  implementations  take  advantage  of 
the  relatively  small  size  of  the  corresponding  implicitly  solved  1-D  problem.  The  3-D  counterpart  does  not 
present  this  possibility  since  the  size  of  the  2-D  system  is  no  longer  small  enough  to  consider  a  direct  solver. 
However,  an  exact  direct  solver  for  the  planes  is  not  needed,  as  has  been  shown  in  [13]  for  the  3-D  Poisson 
equation  and  in  [19]  for  the  incompressible  Navier-Stokes  equations.  This  consideration  drastically  reduces 
the  computational  cost  of  the  overall  algorithm  compared  to  that  of  a  direct  plane  solver. 

In  the  present  work,  the  planes  will  be  approximately  solved  with  a  2-D  multigrid  algorithm  consisting 
of  one  FAS  V(l,l)  cycle  (see  figure  3.1).  The  same  kind  of  anisotropies  found  in  the  3-D  problem  may 
appear  in  the  2-D  system.  Thus  a  robust  multigrid  algorithm  is,  again,  necessary.  In  particular  a  cell-wise 
SCGS  smoother  described  in  section  3.2  combined  with  semicoarsening  has  been  found  fully  robust  for  the 
simulation  of  the  yawed  flat  plate.  However,  for  greater  2-D  problem  sizes  a  point-wise  smoother  may  not  be 
fully  robust,  and  an  implicit  line  smoother  or  a  greater  7  cycle  should  be  used  [3,  16].  The  other  components 
of  the  2-D  multigrid  cycle,  such  as  the  restriction  and  prolongation  operators  or  the  smoothing  process  can 
be  easily  deduced  from  those  derived  in  sections  3.1  and  3.2. 


Slab  of  cells. 


Fig.  3.2.  Slab  of  cells  updated  simultaneously  when  using  the  xy-plane  implicit  smoother. 

The  coupled  philosophy  of  the  SCGS  will  also  be  applied  in  the  plane  solver.  The  plane  smoother 
simultaneously  relaxes  the  momentum  and  continuity  equations  of  the  cells  included  in  the  plane.  Note  that 
the  plane  is  understood  as  a  slab  of  cells  as  shown  in  figure  3.2.  Hence,  all  velocity  components  and  pressures 
contained  within  the  plane  will  be  updated  at  the  same  time.  Let  us  consider  for  example  a  yz- plane,  for 
which  the  procedure  to  solve  the  2-D  problem  on  the  plane  is  as  follows: 

1.  The  mass  fluxes  and  second-order  corrections  of  the  2-D  system  are  computed  based  on  the  current 
solution  in  the  plane.  This  step  corresponds  to  a  global  linearization  of  the  2-D  problem.  Let  us 
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define  the  vector  Xi  that  accommodates  the  variables  for  the  whole  plane  of  cells  as: 

XI  =  (u>  u+,  v,  w,  p)  with 

u  =  {uijk  :  i  =  I,  Vk,j  £  [0,  n]},  u+  =  {uijk  :  i  =  I  +  1,  \/k,j  £  [0,n]}, 

v  =  {vijk  :  i  =  I,  Vk,j  £  [0,  n]},  w  =  {wijk  :  i  =  I,  \/k,j  £  [0,n]}, 

P  =  {Pijk  :  i  =  I,  Vk,j  £  [0,n]}. 

The  system  of  equations  for  the  plane  can  be  written  in  terms  of  residuals  and  corrections  as: 

(3.6)  Li  A Xi  =  n, 

where  Ti  —  ji  —  LiXi  is  the  residual  of  the  ith  yz- plane  and  A  Xi  =  X™+1  —  X™  is  the  increment  of 
the  solution. 

2.  System  3.6  is  solved  by  applying  one  FAS  V(l,l)  multigrid  cycle  as  explained  before.  In  the  following 
simulations  the  coarsest  level  is  solved  with  10  iterations  of  the  smoothing  process  (i/q  =  10). 

3.  The  solution  in  the  plane  is  updated  via  underrelaxation: 

xp+ 1  =  X f  +  wA  Xi, 

LO  —  (uju,  UJu,  UJu,  UJui  i 

where  uou  and  uop  are  the  underrelaxation  factors  for  the  velocity  and  pressure  field  defined  in  section 
3.2. 

Many  types  of  plane  smoothers  can  be  easily  constructed  by  defining  a  specific  ordering  of  the  planes. 
It  is  important  to  note  that  the  second  order  operator  extends  the  stencil  to  the  planes  i,  i  ±  1  and  i  ±  2 
depending  on  the  direction  of  the  flow.  In  order  to  avoid  these  dependencies,  the  construction  of  the  smoother 
can  be  based  on  a  four  color  ordering  of  planes  (see  figure  3.3).  Because  a  four  color  smoother  should  be 
used  as  the  basis  of  a  parallel  implementation,  the  behavior  of  this  smoother  will  be  analyzed  and  compared 
to  the  sequential  lexicographic  order  in  section  4. 


Fig.  3.3.  Planes  orderings  for  the  implicit  plane  smoother. 

4.  Numerical  Experiments.  We  consider  a  cascade  of  square  plates  of  side  L  as  shown  in  figure 
4.1.  In  particular,  we  assume  that  the  leading  edge  of  the  plate  is  placed  at  a  distance  L  from  the  inflow 
plane  of  the  computational  domain.  The  outflow  boundary  is  situated  at  2 L  after  the  trailing  edge.  The 
boundary  conditions  are  defined  as  follows:  on  the  west  face  of  the  computational  domain  we  prescribe 
a  inflow  conditions  consistent  with  some  angle  of  yaw  (0).  The  north  and  south  faces  satisfy  a  periodic 
boundary  condition.  In  order  to  assure  periodicity  in  the  y  direction,  we  will  only  consider  angles  of  yaw 
such  that  there  is  no  interaction  between  the  wakes  of  the  plates.  The  largest  angle  of  yaw  studied  in  this 
work  will  be  0  =  45°.  For  this  situation  the  wake  of  the  plates  (\\u\\  <  1)  at  2  =  0  has  been  shaded  in  figure 
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4.1.  Periodicity  in  the  y  direction  is  clearly  shown  for  this  case,  and  also  therefore  for  the  angles  0 °  <0  <  45° 
used  in  the  following  simulations.  Note  that  for  0  =  0°  the  periodic  boundary  condition  reduces  to  symmetry 
on  the  north  and  south  boundary.  On  the  plate  we  impose  a  no-slip  condition  while  the  top  and  bottom 
faces  will  hold  symmetric  boundary  conditions. 


Periodicity 


Uiuf  / 


/ 


Computed  Coma in 


Fig.  4.1.  Schematic  configuration  of  the  numerical  experiments  (left-hand  chart).  Periodicity  in  y -direction  for  the 
simulation  of  a  flat-plate  at  Q  —  45°,  the  shaded  area  corresponds  to  the  wake  of  each  plate. 


The  treatment  of  the  outflow  needs  a  closer  a  look  since  we  need  to  apply  a  proper  boundary  condition 
in  the  wake  region.  For  the  main  stream  flow  we  can  impose  a  zero  gradient  of  the  velocity  field  in  the  X 
direction: 


(4.1) 


dU_  _  dV_  _  dW 
dX  ~  dX  ~  dX 


However  these  equations  are  not  valid  in  the  wake  of  the  plate  and  need  to  be  improved.  In  most  cases,  the 
flow  in  the  wake  of  a  plate  is  turbulent.  For  Re  <  106,  under  carefully  controlled  conditions,  the  boundary 
layer  can  remain  laminar  past  the  trailing  edge.  However  because  the  velocity  profiles  in  the  wake  have  a 
point  of  inflexion,  the  wake  usually  becomes  turbulent  away  from  the  trailing  edge.  In  this  work  we  will 
confine  Re  <  105  and  assume  that  the  wake  remains  laminar  at  a  distance  of  2 L  behind  the  trailing  edge. 
For  the  wake  region,  the  outflow  boundary  condition  will  be  derived  using  Goldstein’s  calculations  of  the 
velocity  distribution  in  the  wake  of  a  finite  2-D  flat  plate  [11].  Hence,  in  order  to  apply  the  proper  boundary 
conditions,  it  is  important  to  set  the  wake  bounds.  We  will  assume  that  the  wake  region  has  a  negligible 
thickness,  beyond  the  width  of  the  plate,  L/cos(0 ),  in  the  y  direction  (8xy  «  0),  see  figure  4.2.  The  estimation 
of  the  thickness  in  the  2  direction  is  based  on  the  numerical  values  [11]  of  the  velocity  in  the  wake  at  a  given 
distance  behind  the  plate.  In  particular,  with  the  outflow  placed  at  2 L  after  the  trailing  edge  of  the  plate, 
and  fixing  the  wake  limit  where  \\u\  \  =  0.999,  for  a  Reynolds  number  of  104,  it  is  found  that  Sxz  «  0.1L. 
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XY  Plane 


XZ  plane 


Fig.  4.2.  Wake  dimensions  of  the  plate  at  zero  angle  of  yaw 


We  will  first  consider  the  plate  at  zero  angle  of  yaw.  The  gradient  of  the  velocity  component  parallel  to 
the  inflow  velocity,  and  that  perpendicular  to  the  plate,  will  be  improved  using  Goldstein’s  calculation  for 
the  2-D  plate.  The  notation  introduced  in  [11]  will  be  reproduced  here  in  order  to  clarify  the  subsequent 
expressions: 


Rep  —  U^L/p 
D  =  1. 328 pU^L  /  yfRep 
kd  =  D/pU^d 
A  =  1.328/Vi 
d  =  4L 
X,Z 

u,w 

x  =  X/d,  z  =  (Uood/vY^Z/d 
u  =  U/Uoo,  w  =  (Uood/vY^W/Uoo 


Reynolds  number  based  on  the  plate  length 

Viscous  drag  coefficient  from  Blasius  theory 

dimensionless  drag  coefficient 

Constant  of  integration 

representative  length  of  the  plate 

2-D  Cartesian  coordinates 

Dimensional  2-D  velocity  field 

dimensionless  coordinates 

dimensionless  components  of  the  velocity  field 


In  what  follows  we  will  only  need  the  component  of  the  velocity  perpendicular  to  the  plate  which  can 
be  written  with  a  second  order  approximation  as: 

where: 


(4.2) 


w  =  — 


A  A 

^9lM  +  x3/2 


(4.3) 


V  = 


y^X1 

9i(v)  =  ; 

92 (v)  = 


|(1  -rfy 


93(9)  =  ~r\e  v  ; 

9i(v)  =  -V^MO). 


The  gradient  of  U  in  the  X  direction,  will  be  calculated  in  terms  of  the  derivative  of  w  with  respect  to 
2.  With  the  help  of  the  mass  conservation  equation  we  can  express  the  U  derivative  as: 

dU  _  Uoo  dw  dp 
dX  d  dp  dz1 
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(4.4) 


dw  dr\  A  A 


dr]  dz  2x^lr]  +  ( 2x )2 
where  the  functions  are: 

9iy  =  (1~V2)e~^r]2; 

92V  =  fa*  r](r]2  -  3) erf  +  4  “  V2)\j^e~i7 

g3tl  =  (2 rj2  -  l)e_r?2 ; 

(4.5)  54,  =  -2e-v\ 


A  (g2ri  +  92r]  +  92rj)  + 


\/7rRe~i 


Although  expression  4.1  is  probably  a  good  outflow  condition  for  the  Vb-component  we  can  get  a  better 
representation  of  the  gradient  using  the  derivative  of  expression  4.2  with  respect  to  X ,  that  can  be  written 
as: 


(4.6) 


where, 


(4.7) 


dw  _  E/oo  [  dw  dw  Of] 
OX  2 dyJReL  dx  +  dr)  dx 


dr\  _  r] 
dx  2x  ’ 

2rj 

M92(v)  +9s(v)  +9iiv))  +  r~D— 

y/7rneL 


A 


dw 


dx  (2x)3/2 


9  iV 


3  A 


(2x)5/2 


The  benefits  of  the  boundary  conditions  that  we  have  just  derived  are  clearly  shown  in  figure  4.3.  In  this 
figure,  the  pressure  contour  lines  at  2  =  0  have  been  plotted  for  two  different  simulations  with  0  =  0°  and  a 
Reynolds  number  of  10000.  In  the  right-hand  chart  the  boundary  condition  4.1  has  been  used  for  the  whole 
outflow  boundary.  At  the  outflow,  in  the  wake  of  the  plate,  there  is  a  region  of  a  favorable  pressure  gradient 
that  accelerates  the  velocity  at  the  outlet  in  order  to  match  the  wrong  outflow  condition.  In  the  left-hand 
chart  the  boundary  conditions  4.4  and  4.6  have  been  used  in  the  wake  region  combined  with  4.1  outside  the 
wake.  As  a  result  of  this  boundary  condition  the  low  pressure  zone  in  the  wake  of  the  plate  disappears. 


p:  -1  .IE-02  1.5E-02  4.4E-02  7.4E-02  1.0E-01  1.3E-01 


p:  -0.039  -0.015  0.010  0.034  0.058  0.083  0.107  0.132 


Uinf 


Uinf 


Fig.  4.3.  Pressure  contour  lines  at  z=0  for  a  Reynolds  number  of  10000  using  a  zero  gradient  wake  condition  (right-hand 
chart)  and  the  improved  boundary  condition  (left-hand  chart)  for  9  =  0°. 
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The  solution  is  verified  by  comparing  the  u- velocity  in  the  middle  of  the  plate  with  the  Blasius  analytical 
solution  for  a  2-D  plate  (figure  4.4).  The  low  discrepancy  near  the  layer  edge  is  due  to  the  highly  stretched 
grid  used  in  this  simulation.  The  spatial  convergence  of  the  viscous  drag  coefficient  Df  versus  the  nominal 
grid  spacing  for  a  set  of  grids  with  an  increasing  number  of  points  in  the  2  direction  (regular  spacing)  is 
shown  in  figure  4.4,  where: 


Second  order  accuracy  is  clear;  the  coefficient  converges  to  a  value  approximately  3%  lower  (Df  =  0.020455) 
than  the  one  predicted  by  the  Blasius  theory  (Df  =  =  =  0.02099).  It  is  interesting  to  note  that  the 

accuracy  obtained  with  the  128x24x32  grid  is  the  same  as  that  obtained  with  the  32x24x32  stretched  grid, 
with  the  consequent  saving  in  computing  time. 


1.2 
1 
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5  0.6 
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0.2 
0 

0123456789  02468  10  12  14  16 

Scaled  Coordinate  y  Scaled  Mesh  Space  (h/ho)A2 

Fig.  4.4.  Simulation  comparison  with  Blasius  theory  at  the  middle  of  the  plate  with  Re  =  104  (left-hand  chart).  Conver¬ 
gence  of  the  viscous  drag  versus  nominal  grid  spacing  (right-hand  chart). 


When  the  inflow  is  at  some  angle  of  yaw,  the  outflow- wake  boundary  conditions  are  approximated  by 
rotating  the  zero  yaw  angle  solution.  This  approximation  is  only  valid  for  moderate  angles  of  yaw,  given  that 
the  characteristics  of  the  problem  do  not  diverge  too  much  from  the  main  stream.  However,  in  a  general 
situation  for  larger  angles  of  yaw,  such  as  those  considered  below,  this  assumption  is  no  longer  valid.  The 
effect  of  the  rotated  outflow  conditions  on  the  plate  has  been  checked  by  running  a  case  with  the  outflow  plane 
further  out.  The  flow  field  over  the  plate  is  not  perceptively  affected  by  the  rotated  boundary  conditions. 
In  fact,  when  comparing  both  results  the  difference  in  the  flow  over  the  plate  has  been  found  to  be  below 
the  2%.  If  we  define  (x',y',z')  as  a  frame  rotated  by  the  angle  of  yaw  0 ,  and,  if  (u',v',w')  are  the  velocity 
components  in  the  (x'^y'^z')  frame  we  have: 

u  =  u'  cos  a  +  v'  sin  a 
v  =  —  uf  sin  a  +  vf  cos  a 

(4.9)  w  =  wf 

Now,  the  gradients  dU/dX  and  dW/dX  can  be  easily  obtained  from  equation  4.9.  Note  also  that  at  yaw, 
the  effective  length  of  the  plate  changes  and  so  equation  4.2  has  to  be  modified  to  take  this  into  account. 

In  figure  4.5  the  drag  coefficient  versus  the  angle  of  yaw  is  shown  for  the  64x48x64  grid  with  a  Reynolds 
number  of  104.  In  this  case,  the  contribution  from  the  u  and  v  components  of  the  velocity  field  have  to  be 
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Angle  of  yaw 

Fig.  4.5.  Viscous  Drag  coefficient  versus  the  angle  of  yaw  for  a  64x48x64  stretched  grid  at  Re  =  104. 


taken  into  account  in  the  shear  stress  vector: 


(4.10) 


Df  = 


du  dv  . 

— -  cos  a  +  —  sin  a 
oz  oz 


z= 0 


dS. 


This  integral  is  computed  with  second  order  accuracy  using  the  midpoint  rule,  where  the  derivatives  are 
approximated  with  a  three  point  formula.  The  pressure  contours  at  2  =  0  for  two  different  angles  of  yaw 
(0  =  20°  and  0  =  45°)  are  shown  in  figure  4.6.  For  these  cases,  the  pressure  gradients  in  the  wake  of  the 
plates  at  the  outflow  can  also  be  seen.  However,  the  magnitude  of  these  gradients  are  small  compared  to 
those  that  appear  at  0  =  0°  and  the  zero  gradient  boundary  condition  for  the  wake  of  the  plate. 


Fig.  4.6.  Pressure  contour  lines  at  z=0  for  a  Reynolds  number  of  10000  for  0  =  20°  (left-hand  chart)  and  0  =  45° 
(right-hand  chart). 

In  order  to  capture  the  viscous  effects,  the  grids  used  in  this  work  are  highly  stretched  near  the  plate. 
Moreover,  the  grids  are  refined  (geometrically  stretched,  see  below)  near  the  plate  edges  to  reduce  the  large 
discretization  errors  in  those  zones  as  advocated  by  Thomas  et  al.  [22]  (see  for  example  figure  4.3).  The 
simulations  have  been  performed  over  two  different  grids  of  dimensions:  32x24x32  and  64x48x64.  These 
grids  use  a  stretching  of  the  form  hk+i  =  the  stretching  factor  /3  being  equal  to  1.3  and  1.2  respectively. 
It  is  interesting  to  note  that,  in  order  to  keep  the  h 2  solution  accuracy  the  stretching  factor  should  be 
(3  =  1  + 0(h).  In  the  following  simulations  the  number  of  multigrid  levels  has  been  fixed  so  that  the  coarsest 
level  has  4  planes. 
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Figure  4.7  shows  the  Z/2-norm  of  the  residual  versus  F(2,l)-cycles  with  an  x^-plane  implicit  smoother 
combined  with  Z  semicoarsening  for  several  angles  of  yaw  and  two  different  smoothers.  The  residual  norm 
is  reduced  by  four  orders  of  magnitude  in  the  first  five  cycles  in  the  finest  grid  in  all  cases  (corresponding 
to  a  convergence  rate  of  roughly  0.1  per  fine  grid  iteration).  This  average  convergence  rate  is  close  to  that 
obtained  for  the  3-D  Poisson  equation  using  a  plane  implicit  smoother  combined  with  semicoarsening  [17]. 
Moreover,  the  convergence  rate  is  also  independent  of  the  angle  of  the  non-alignment  parameter  for  angles 
0°  <  0  <  45°. 
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Fig.  4.7.  L2-norm  of  the  average  residual  versus  F(2,l)-cycles  with  an  xy-plane  implicit  smoother  combined  with  z 
semi  coarsening  with  a  lexicographic  (Lx)  and  a  four- color  (Fc)  ordering  of  planes,  and  several  angles  of  yaw.  The  Simulation 
was  performed  on  a  32x24x32  grid  and  a  Reynolds  number  of  104. 

Figure  4.8  shows  the  L2-norm  of  the  residual  versus  F(2,l)-cycles  for  a  64x48x64  grid.  The  residual 
norm  in  this  case  is  also  reduced  by  four  orders  of  magnitude  in  the  first  five  cycles  in  the  finest  grid  in 
all  cases.  Thus,  the  combination  of  plane-implicit  smoothers  with  semicoarsening  considered  here  exhibits 
a  textbook  convergence  rate,  i.e. ,  independent  of  the  problem  size,  grid  stretching  factor  and  angle  of  yaw. 
Although  this  problem  is  a  convection-dominated  non-aligned  flow  for  which  multigrid  schemes  might  have 
convergence  difficulties,  we  do  not  experience  these  difficulties  for  the  grids  and  Reynolds  numbers  used. 
However  in  a  general  situation  an  explicit  coarse-grid  correction  of  cross-characteristic  components  may  be 
needed  (see  for  example  [8]).  Moreover,  when  using  a  four-color  smoother  the  efficiency  of  the  algorithm  is 
not  reduced.  This  result  enables  a  viable  parallel  implementation  of  the  multigrid  algorithm  studied. 

We  have  studied  the  case  of  Re  =  105  and  have  obtained  a  convergence  rate  of  0.2  and  a  drag  coefficient 
within  5%  of  that  predicted  by  the  Blasius  solution.  The  results  indicate  that  a  finer  mesh  is  needed  to 
improve  the  accuracy  of  the  results.  This  will  be  the  subject  of  further  studies  that  would  exploit  the 
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Fig.  4.8.  L2-norm  of  the  average  residual  versus  F(2,l)-cycles  with  an  xy-plane  implicit  smoother  combined  with  z 
semicoarsening  with  a  lexicographic  (Lx)  and  a  four-color  (Fc)  ordering  of  planes  and  several  angles  of  yaw.  The  Simulation 
was  performed  on  a  64x48x64  grid  and  a  Reynolds  number  of  104. 


parallel  properties  of  the  algorithm. 

5.  Conclusions.  A  multigrid  algorithm  to  solve  the  incompressible  3-D  Navier-Stokes  equations  with 
severely  stretched  grids  and  non-aligned  flows  has  been  presented.  The  core  of  the  multigrid  algorithm  is 
a  coupled  plane-implicit  solver  combined  with  semicoarsening.  The  plane  solver  used  is  also  a  robust  2-D 
multigrid  algorithm  based  on  a  cell-implicit  smoother  combined  with  semicoarsening.  This  plane  smoother 
has  been  found  fully  robust  for  the  problem  sizes  covered  in  this  work.  Textbook  multigrid  convergence  has 
been  demonstrated  for  the  simulation  of  a  yawed  flat  plate  boundary  layer.  That  is,  the  convergence  rate  is 
independent  of  the  grid  size,  grid  stretching  factor  and  the  angle  of  yaw.  Moreover,  this  convergence  factor 
has  been  found  to  be  close  to  the  value  expected  for  elliptic  equations.  A  parallel  variant  of  the  smoothing 
process  based  in  a  four-color  ordering  of  planes  has  also  been  analyzed.  The  convergence  rate  does  not 
deteriorate  in  this  situation,  achieving  TMC  for  the  parallel  smoother.  The  outflow  boundary  treatment  has 
been  discussed  in  depth  in  this  report.  The  code  has  been  validated  with  Blasius  theory  and  an  accurate 
measurement  of  the  viscous  drag  coefficient  has  been  obtained  by  exploiting  the  robustness  of  the  solver. 
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